Add t-digest#12961
Conversation
56a6498 to
2fe3004
Compare
tdcmeehan
left a comment
There was a problem hiding this comment.
Some initial comments. I would also move these classes into their own package for now, com.facebook.presto.operator.scalar.tdigest
rongrong
left a comment
There was a problem hiding this comment.
Personally I don't think changing variable names are necessary (yes, longer names are more readable, but for algorithms like this readable names is not that critical to understanding it anyways. I feel safer to see they are not changed from the original. Just personal opinion.), also it's not clear to me why a lot of comments are removed. Reducing these would make reviewing easier.
There was a problem hiding this comment.
It's a bit strange to see hashCode and equals not based on the same variables. What's the reason behind this?
There was a problem hiding this comment.
Each instance of a centroid has a unique id, even if the centroid contains the same values. Therefore, if we compare by id, it would always return false.
2cfea70 to
078308b
Compare
5c3169a to
532fb28
Compare
c6fe433 to
9fba397
Compare
There was a problem hiding this comment.
Do we really need to update png files? Or we can just strip them out
There was a problem hiding this comment.
Having the image makes it a lot easier to understand and is consistent with QuantileDigest documentation, so I think we should keep them.
|
there is one commit with a wrong email |
|
Chatted offline; we will leave most names unchanged. |
b83e7a6 to
9a23de4
Compare
jessesleeping
left a comment
There was a problem hiding this comment.
I assume the implementation from Ted Dunning is correct as I didn't look into the actual logic of the t-digest implementation. The part that wiring it to Presto looks good to me.
highker
left a comment
There was a problem hiding this comment.
- Commit title "Add serialization documents" -> "Add tdigest serialization documents" maybe.
- Can you fix your email address for commit "Add NOTICES file"?
wenleix
left a comment
There was a problem hiding this comment.
Some comments after a quick glance.
|
Per offline discussion, it's an intention decision to first use existing solution as it is. |
I agree for the initial commit we can keep it as a copy. Although I personally prefer to copy it into a different project (e.g. But since @tdcmeehan volunteered to move them to a different project right after this, let's merge as it is 😃 |
wenleix
left a comment
There was a problem hiding this comment.
"Add implementation of T-Digest"
Maybe also add commit message like "This commit copied the original code as it is intentionally. Refactored will be done in future commits"
This commit copied the original code as it is intentionally. Refactoring will be done in future commits.
b44c56d to
3aabb13
Compare
|
There's some copy paste in the graffle metadata. Other than that I think this is good to merge @rongrong |
Remove unnecessary functions for Presto t-digest needs.

Add t-digest to Presto, which allows faster and more accurate results when calculating quantiles while saving space in memory. In reference to issue #12929.
Note
We make intentional decision to just copy the existing implementation for now. See #12961 (review) and #12961 (comment)